DUSTer: A Method for Unraveling Cross-Language Divergences for Statistical Word-Level Alignment
نویسندگان
چکیده
The frequent occurrence of divergences|structural diier-ences between languages|presents a great challenge for statistical word-level alignment. In this paper, we introduce DUSTer, a method for systematically identifying common divergence types and transforming an English sentence structure to bear a closer resemblance to that of another language. Our ultimate goal is to enable more accurate alignment and projection of dependency trees in another language without requiring any training on dependency-tree data in that language. We present an empirical analysis comparing the complexities of performing word-level alignments with and without divergence handling. Our results suggest that our approach facilitates word-level alignment, particularly for sentence pairs containing divergences.
منابع مشابه
Divergence Unraveling for Word Alignment of Parallel Corpora
We describe the use of parallel text for divergence unraveling in word-level alignment. DUSTer (Divergence Unraveling for Statistical Translation) is a system that combines linguistic and statistical knowledge to resolve structural differences between languages, i.e., translation divergences, during the process of alignment. Our immediate goal is to induce word-level alignments that are more ac...
متن کاملTitle of dissertation : COMBINING LINGUISTIC AND MACHINE LEARNING TECHNIQUES FOR WORD ALIGNMENT IMPROVEMENT
Title of dissertation: COMBINING LINGUISTIC AND MACHINE LEARNING TECHNIQUES FOR WORD ALIGNMENT IMPROVEMENT Necip Fazıl Ayan, Doctor of Philosophy, 2005 Dissertation directed by: Professor Bonnie J. Dorr Department of Computer Science Alignment of words, i.e., detection of corresponding units between two sentences that are translations of each other, has been shown to be crucial for the success ...
متن کاملImproved Word-Level Alignment: Injecting Knowledge about MT Divergences
Under consideration for other conferences (specify)? none Abstract Word-level alignments of bilingual text (bitexts) are not only an integral part of statistical machine translation models, but also useful for lexical acquisition, treebank construction, and part-of-speech tagging. The frequent occurrence of divergences, structural diierences between languages, presents a great challenge to the ...
متن کاملImproving Bitext Word Alignments via Syntax-based Reordering of English
We present an improved method for automated word alignment of parallel texts which takes advantage of knowledge of syntactic divergences, while avoiding the need for syntactic analysis of the less resource rich language, and retaining the robustness of syntactically agnostic approaches such as the IBM word alignment models. We achieve this by using simple, easily-elicited knowledge to produce s...
متن کاملMulti-align: Combining Linguistic and Statistical Techniques to Improve Alignments for Adaptable MT
The continuously growing MT market faces the challenge of translating new languages, diverse genres, and different domains using a variety of available linguistic resources. As such, MT system adaptability has become a sought-after necessity. An adaptable statistical or Hybrid MT system relies heavily on the quality of word-level alignments of real-world data. Statistical alignment approaches p...
متن کامل